-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
back to io/wait #368
back to io/wait #368
Conversation
@tarcieri we can continue the discussion here. Checking ruby's source code, I see the same "faulty" wait_* methods handling which caused the known bug. The main difference being, net/http closes connections by default. In the case of this gem, the preferred default is Keep-Alive, which is a design choice. This is the sequence of events which randomly triggers the issue (refers to https://github.com/httprb/http/blob/master/lib/http/timeout/per_operation.rb#L61-L72):
So, this might have been already fixed in ruby trunk (according to the code it now nly handles time outs, although implementation could mean something else), which means we can't rely on the So, I see these possible solutions:
tout = false
loop do
result = @socket.read_nonblock(size, :exception => false)
return :eof if result.nil?
raise TimeoutError, "Read timed out after #{read_timeout} seconds" if tout
return result if result != :wait_readable
unless @socket.to_io.wait_readable(read_timeout)
tout = true
end
end
What do you think? First option seems to demand less overhead, but second option seems more correct, but might demand more internal refactoring. |
@TiagoCardoso1983 for what it's worth, a GSoC student has been working on an implementation of a Java-like (fixed-sized) ByteBuffer type which bypasses Ruby I/O and does native I/O on native off-heap buffers (or at least that's the goal, some implementations still pending, but GSoC isn't over yet): https://github.com/celluloid/nio4r/pull/95/files I think, implemented correctly, this is probably the best way to get consistent I/O semantics. There are backends in C, Java, and pure Ruby. |
@tarcieri that's good to hear, but is it cancelling this fix? IMO it shouldn't be hard to get something which just works with the stdlib, even with the current weird |
@TiagoCardoso1983 I don't entirely understand what you're proposing in regard to the HTTP semantics you describe. I guess I'll have to further study your links to have an opinion. Regarding MRI async I/O errata, this is the sort of thing that really feels like it needs regression tests. I would really like to capture the previous regressions we've seen in an integration test. |
Going forward with it, and judginy by this and this, the Response object has access to the response headers before reading the http response body, hence the 'Message-Length'. We could therefore pass this value to the Response Body object, which means that the response body would know when to stop reading the body without relying in the IO events. |
@TiagoCardoso1983 I can't find any information on this Regardless, you still need to handle cases where these sorts of headers aren't provided by non-compliant servers. |
I already linked in a message above. And it's |
@TiagoCardoso1983 we already support |
Ok, looking at the code again it seems the response That said, not all servers send it or even know it in advance, so at best it can be used opportunistically. |
@tarcieri can you have a look at the rfc link I've linked a few messages above? There are some corner cases regarding the response body length which this gem might not address, besides that one. Regarding the example from the related issue (the https link to google play store), which issues a 404 status code, the response comes with "Transfer-Encoding: chunked". The corrected(?) link which responds to 200 has the Content-Length set. Use curl for these two links to see it for yourself:
The way I see it, even if not being fully compliant with the RFC, this gem could at least handle these two cases, in which when length is present, you only read as much, and when response is chunked, you read until the termination character (here's how it's defined in rack. I don't agree with what you said. Servers which do not know content length in advance, encode the response as chunked. If you decide to ignore those headers because some non-compliant servers exist, you are opening yourself to a DoS vector, in that you read the body indefinitely until the "malicious" server stops sending data (right?). |
Just for comparison sake, here's how net-http reads the body: https://github.com/ruby/ruby/blob/trunk/lib/net/http/response.rb#L280-L301 |
Yes, it should use |
99f2f83
to
4b2bdcf
Compare
@tarcieri I think I got it this time, and the failing example from #298 So, the The caveat on the
require 'socket'
require 'io/wait'
Thread.new do
server = TCPServer.new 2000 # Server bind to port 2000
loop do
client = server.accept # Wait for a client to connect
client.puts "Hello !"
client.puts "Time is #{Time.now}"
sleep 3
client.close
end
end
puts "server is starting..."
sleep 2
puts "connecting now"
s = TCPSocket.new 'localhost', 2000
puts "connected"
10.times do
k = s.wait_readable(2)
puts "wait: #{k.inspect}"
v = s.read_nonblock(8, exception: false)
puts "val: #{v.inspect}"
break if v == nil
end I will fill a bug to see what the core team thinks, but this makes indeed leveraging |
@TiagoCardoso1983 I believe the |
lib/http/errors.rb
Outdated
@@ -20,4 +20,7 @@ class TimeoutError < Error; end | |||
|
|||
# Header name is invalid | |||
class InvalidHeaderNameError < Error; end | |||
|
|||
# Header value is of unexpected format (similar to Net::HTTPHeaderSyntaxError) | |||
class HeaderSyntaxError < Error; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of changing this to:
class HeaderError < Error; end
class InvalidHeaderNameError < HeaderError; end
class InvalidHeaderValueError < HeaderError; end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
humm, this one is tricky... I think the class of errors that can come from headers are not covered by these 2. Maybe the separator is wrong, for example. Also, this is in theory a ResponseError, as it's not a request header, and etc.
IMO I think the best solution could be just to create a simple HeaderError and use it everywhere, and add the details in the message. Unless I'm missing the motive for so much error granularity.
Let me know what you think, I can change to your suggestion for now, as soon as I come back to this. I work behind a proxy during the week, which is contatenating chunked responses, thereby blocking my use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm. I tend to agree that single class HeaderError < Error; end
should be enough.
But in order to make it backward compatible I propose to do this:
class HeaderError < Error; end
# @deprecated will be removed in v3.0.0
InvalidHeaderNameError = HeaderError
So I've reverted the exception for content length, and it returns nil, and the response can be instantiated. Which means that the body will be fetched until end of connection, potentially ad eternum. Besides the security reason there, I think that this feature doesn't conform to the HTTP/1.1 spec, and even net/http does it. I don't know why one needs to instantiate the response if the response/headers is/are invalid, but then again, I only contributed a few commits and you know more about the usage history of this gem. In case one wants to satisfy both requirements, the way the response is instantiated has to change, and this should maybe be the task of the core team. |
@TiagoCardoso1983 chalk it up to Postel's law. That said, the "security" issue you're describing exists for a compliant implementation which follows an endless chunked response and exists in every compliant HTTP/1.1 client. Looks like there are some conflicts that need to be resolved. |
lib/http/response.rb
Outdated
value = @headers[Headers::CONTENT_LENGTH] | ||
return unless value | ||
|
||
begin | ||
Integer(value) | ||
rescue ArgumentError | ||
nil | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you, please, return nil
in here. It will make rubocop a bit pleased ;))
2ddadff
to
3e43c3d
Compare
@tarcieri that is correct, I meant more as an unexpected security concern, as the protocol defines those use cases. My goal is just to document it in case future issues arise from it. |
lib/http/response.rb
Outdated
# Clause 3: "If a message is received with both a Transfer-Encoding | ||
# and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. | ||
return nil if @headers.include?(Headers::TRANSFER_ENCODING) || | ||
[email protected]?(Headers::CONTENT_LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split those in two separate statements for sake of simplicity (took me a while to understand that this is correct statement -- was pretty sure there's a mistake) as it was.
return nil if @headers.include?(Headers::TRANSFER_ENCODING)
value = @headers[Headers::CONTENT_LENGTH]
return nil unless value
# ...
Calling #include?
and then #[]
will involve extra call in case of "happy path". Although #include?
is more efficient than #[]
, when you end up having those two - it worth to keep #[]
only (at least in this particular place). Just to explain:
class Headers
def include?(name)
@headers_array.any? { |key, val| key == name }
end
def [](name)
values = @headers_array.each_with_object([]) { |(k, v), a| a << v if k == name }
return nil if values.empty?
return values.first if 1 == values.count
values
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Actually what one needs is something equivalent do Enumerable#find
. As Headers is Enumerable, this shouldn't be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. You can call find on Headers instance:
HTTP.get("https://ya.ru").headers.find { |key, val| "Content-Length" == key }
# => ["Content-Length", "9578"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I think toward changing the way headers are kept internally, that will make all operations on Headers fast and efficient. Right now it's an array of [key, val]
arrays.
The only good thing about this format is that it allows "recreate" headers list as is. BUT. We normalize header names, so it's not actually guaranteed. So I lean towards changing data under the hood to be Hash{String, Array<String>}
. That will make most operations O(1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing is that if that is gonna happen (change to underlying data structure) it will be released only as next major release due to introduction of backward incompatible changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might also try extracting the timeout code into a separate gem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pretty sure @zanker was doing something related to that, although I might misunderstood something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure he abandoned his efforts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Just out of curiosity, what's the idea of normalizing the header key?
@HoneyryderChuck this still looks like good stuff (at least for 3.0). Mind rebasing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor tweaks and I'll be happy to merge!
lib/http/errors.rb
Outdated
# Header value is of unexpected format (similar to Net::HTTPHeaderSyntaxError) | ||
class HeaderError < Error; end | ||
|
||
# @deprecated will be removed in v3.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we're gonna release this PR as part of v3.0.0
it worth to change this to be will be removed in v3.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if it is to be removed in 3.0.0 and this PR is about 3.0.0, then why not just go and remove support? I'd save this tweak to upstream though, it's beyond the scope of the PR IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm. I guess I'm agree. Just remove this deprecated class then.
lib/http/response/body.rb
Outdated
@@ -53,7 +53,10 @@ def to_s | |||
@streaming = false | |||
@contents = String.new("").force_encoding(encoding) | |||
|
|||
while (chunk = @stream.readpartial) | |||
length = @length || Float::INFINITY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow your initial idea of limiting amount of data to given Content-Length when available and should be respected according to RFC.
can I make a few suggestions regarding rubocop? Because, if you notice the ammount of useful commits against rubocop tweaks in this PR, you can realize how annoying that might have been. I get the usefulness of maintaining a standard, but there are some inconsistencies which make one lose focus. Main one being, the hash-rocket insistence. Main reasons are:
# syntax error
def bla(:a => nil, :b => nil)
# if I don't call #bla as this, rubocop complaints
bla(:a => 1, :b => 2) I also suggest dropping this spec given that minimum requirement of this gem is ruby 2.0, and the "new" hash syntax exists already since 1.9 . |
Will merge this today. |
hopefully you make it, as "before the weekend" just passed ;) not to sound like an a-hole, but I've been mostly fighting conflicts and rubocop for the last week, which is time and nerve-consuming, for a gem I haven't been using lately. |
@ixti remember to push master to We should probably also add some README text to clarify the version numbers and development state of each branch (I can take care of that) @HoneyryderChuck I'm 👍 on moving to the new hash syntax across the board, @ixti is the detractor there. re: RuboCop, are you aware of |
This is a squashed commit of PR: #368
@tarcieri I actually wasn't, thx for the tip 👍 |
I have squashed and merged this: cb44c5d |
@tarcieri 2-x-stable was actually created as soon as we released v2.0.0, and now it's in sync with v.2.2.1, while master contains this PR |
Re: hash syntax. Yes. I am strongly against new hash syntax. |
@ixti yeah, just making sure it was up-to-date before merging this. I'll add some clarifying information to the READMEs tonight. re: hash syntax, I'm only weakly in favor of the new syntax, so I guess your strong objection wins 😜 |
@ixti respect your opinion, and as one of the moderators of this project, your counts the most. In terms of creating arbitrary hashes, I have to say that I don't have a strong enough counter-argument, since as stated in the linked issue, you can't represent non-symbol types with the new syntax. However, I think one could "relax" the policy and open an exception, namely when you pass the options directly in the method signature. From what I understood, you're not against kwargs. I'd suggest "enforcing" them in the internal http API (at least), as it adds argument-not-type checks on method call which map to more meaningful error messages, and will remove options hash custom handling like this one bit here (see #fetch calls). Makes a maintainer's life much easier and lessens "primitive obsession". That being said, I think that insisting in the old hash-rocket for method calls creates confusion, as there is dissonance between defining a method and calling a method: def person(first: , last: )
...
# same
person(first: "Dirty", last: "Harry")
# not same
person(:first => "Dirty", :last => "Harry") To summarize, I agree with you in the hash thing, but the fact that you can't define kwargs in an hash-rocket-compatible forces you to handle the hash syntax inconsistency one one or another. The only way to prevent this would be not to use kwargs at all, but I hope I made a strong argument about why you should use kwargs. And therefore, I think that the new syntax could be accepted as long as the new syntax could only be used in method calls directly and not to instantiate hashes. Whether rubocop handles such an exception, that's another topic. Those are my 2 cents. As stated, I respect the logic of the choice. |
@HoneyryderChuck We're all for kwargs. It's just we dropped Ruby relatively not so long ago, so nobody was working on rewriting internal API. I believe that's a great idea (to get rid of opts hashes) for 3.0.0 release. I agree with you that current restriction introduces a small confusion (in method calling vs method definition), but I find it not so bad, after all method definition differs from method calling in general, e.g.: def say(what = "wut?")
# ...
end
# you can:
say("hi!")
# but can't (obviously)
say(what = "hi") So I prefer to look at this particular problem (of kwargs definition/usage difference) through this prism :D Relaxing of rubocop on has syntax will add extra burden on those who review to check styling as well as one will have to verify if used Hash is not a hash but kwargs. |
I think, at least, we can enable the |
We should start favoring keyword arguments See: #368 (comment)
pkgsrc changes: - sort DEPENDS Upstream changes (from CHANGES.md): ## 3.0.0 (2017-10-01) * Drop support of Ruby `2.0` and Ruby `2.1`. ([@ixti]) * [#410](httprb/http#410) Infer `Host` header upon redirects. ([@janko-m]) * [#409](httprb/http#409) Enables request body streaming on any IO object. ([@janko-m]) * [#413](httprb/http#413), [#414](httprb/http#414) Fix encoding of body chunks. ([@janko-m]) * [#368](httprb/http#368), [#357](httprb/http#357) Fix timeout issue. ([@HoneyryderChuck])
should take care of #357